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Abstract 

Expression profiling is one of the most important tools for dissecting biological functions of genes and the upregulation or down- 
regulation of gene expression is sufficient for recreating phenotypic differences. Expression divergence of genes significantly con- 
tributes to phenotypic variations. However, little is known on the molecular basis of expression divergence and evolution among rice 
genotypes with contrasting phenotypes. In this study, we have implemented an integrative approach using bioinformatics and 
experimental analyses to provide insights into genomic variation, expression divergence, and evolution between salinity-sensitive rice 
variety Nipponbare and tolerant rice line Pokkali under normal and high salinity stress conditions. We have detected thousands of 
differentially expressed genes between these two genotypes and thousands of up- or downregulated genes under high salinity stress. 
Many genes were first detected with expression evidence using custom microarray analysis. Some gene families were preferentially 
regulated by high salinity stress and might play key roles in stress-responsive biological processes. Genomic variations in promoter 
regions resulted from single nucleotide polymorphisms, indels (1-1 0 bp of insertion/deletion), and structural variations significantly 
contributed to the expression divergence and regulation. Our data also showed that tandem and segmental duplication, CACTA and 
M7~elements played roles in the evolution of gene expression divergence and regulation between these two contrasting genotypes 
under normal or high salinity stress conditions. 

Key words: custom microarray, gene set enrichment analysis, promoter motif, single nucleotide polymorphism, tandem and 
segmental duplication, transposition. 



Introduction 

Salinity is one of the major environmental factors that limit 
crop production. Rice is moderately susceptible to salinity and 
most rice cultivars are severely injured when grown under field 
conditions with an electrical conductivity 6-10dSrrT 1 and 
yield losses were estimated to be 30-50% (Zeng et al. 
2002; Islam et al. 2007). The yield losses would be due to 
high concentrations of saline in the soil (5dSm~ 1 ^ 50 mM 
NaCI), which led to ion cytotoxicity, metabolic imbalances, 
and stress damage (Zhu 2001; Chinnusamy et al. 2005; 
Joseph and Mohanan 2013). With increasing population 
and reduced area of rice field, it is essential to breed and/or 
genetically engineer salinity-tolerant rice varieties suitable for 



planting in saline soil such as coastal areas. Indica rice varieties 
generally show higher level of tolerance when compared with 
japonica rice varieties (Lee et al. 2003). Furthermore, within 
either japonica or indica germplasms, they also exhibited sig- 
nificant differences in their response to high salinity stress 
(Gregorio et al. 2002; Lee et al. 2003; Ismail et al. 2007). 
These results suggested that valuable salinity stress-related 
genes could be found from rice germplasms, and these 
genes can be employed to improve rice varieties by molecular 
breeding. Therefore, the key is how to efficiently identify these 
genes and use them for rice genetic improvement. 

Genome-wide expression analysis is an efficient tool to 
identify differentially expressed genes under high salinity 
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stress conditions. Early rice microarray analysis was carried out 
using the custom microarrays including only 1,728 cDNAs 
from libraries of salt-stressed roots (Kawasaki et al. 2001). 
By using the chips, around 10% of probes were detected 
with differential expression regulation under salinity stress. 
After that, Affymetrix microarrays were used to investigate 
transcriptional profiling under salt stress. Walia et al. (2005) 
compared expression patterns of two contrasting indica rice 
lines FL478 (tolerance to high salinity) and IR29 (sensitive to 
high salinity) during vegetative growth stage. Their data 
showed that salinity stress induced a number of genes in- 
volved in flavonoid biosynthesis pathway only in IR29 but 
not in FL478 and cell wall-related genes were responsive in 
both genotypes. Cotsaftis et al (2011) analyzed the root- 
specific transcript profiling by comparing four indica rice 
lines under salinity stress. They have identified some genes 
and their families that are likely involved in the response to 
this stress. All of the aforementioned data were from indica 
rice. Genome-wide comparative expression was also carried 
out using both japonica and indica rice lines under salinity 
stress (Walia et al. 2007). However, they only analyzed 
the expression during panicle initiation stage. Salinity suscep- 
tibility to abiotic stress varies during rice life cycle. During 
germination stage, rice is relatively tolerant to high salinity 
stress and becomes very sensitive at two stages, including 
young seedling stage and early reproductive stage. Thus, no 
comparative expression data are available at young seedling 
stage between indica and japonica rice lines under salinity 
stress. 

Under high salinity stress, diverse damages have been 
observed in plants such as the disruption of intracellular ion 
homeostasis, membrane dysfunction, inhibition of metabolic 
activity and reduction of photosynthesis, and production of 
ROS (reactive oxygen species) (Zhu 2001 ; Mahajan and Tuteja 
2005). Some gene families have been observed to play crucial 
roles in these damages and their recovery or other stress-re- 
lated biological processes. Here, seven different gene families 
were selected for transcriptional profiling, and they were 
named as high-affinity K + transporter (HKT), Na_H_exchanger 
(NHX), calmodulin, calcineurin B-like protein and its target 
kinase (CIPK), vacuolar H(+)-translocating inorganic pyropho- 
sphatase (VHP), dehydrin, and stress-responsive protein (SRP) 
families. The HKT family mediates important Na + tolerance 
mechanisms in plants (Maser et al. 2002; Ren et al. 2005; 
Horie et al. 2009). For example, OsHKT2;1 (previously 
named as OsHKTI) in rice was downregulated in 30 mM 
NaCI (Horie et al. 2001) and played a role in mediating Na + 
influx into roots of K + -starved rice plants (Horie et al. 2007). 
The NHX family plays roles in maintaining cellular pH and Na + 
homeostasis by moving Na + either out of cells or into organ- 
elles in exchange for H~ (Rodnguez-Rosales et al. 2009). 
Calmodulin proteins were also involved in salt stress and ion 
homeostasis as well as in transcriptional regulation under 
stress (Ranty et al. 2006; Reddy et al. 2011). CIPKs play 



important roles in plant calcium signaling in response to stres- 
ses (Kolukisaoglu et al. 2004). VHP is an electrogenic proton 
pump that translocates protons into vacuoles in plant cells 
(Silva and Geros 2009). Efficient Na + exclusion is one of the 
major mechanisms for salinity tolerance (Munns and Tester 
2008). The exclusion of Na + is driven by the plasma mem- 
brane H(+)-ATPase and by the vacuolar membrane H(+)- 
ATPase and H(+)-pyrophosphatase. Dehydrins are a class of 
hydrophilic, thermostable stress proteins that belong to the 
Group II late embryogenesis abundant family. These genes 
are expressed during late embryogenesis or in vegetative tis- 
sues in response to abiotic stresses in some of the tested plants 
(Yang et al. 2012). SRP family members are proteins with 
Pfam domain (PF06219). Their functions are not yet deter- 
mined. On the other hand, transcription factors also play im- 
portant roles in high salinity stress. Currently, many members 
from different families of transcription factors have been 
proven with biological functions in plant salinity stress re- 
sponse. For example, AP2/EREBP transcription factors consist 
of four subfamilies including AP2, CBF/DREB, ERF, and RAV, 
and at least one member from these subfamilies have been 
involved in salinity stress (Zhang et al. 2009). Many transcrip- 
tion factors from other families also function in salinity stress- 
related biological processes such as WRKY and NAC (No apical 
meristem, Arabidopsis transcription activation factor, Cup- 
shaped cotyledon) transcription factors (Ramamoorthy et al. 
2008; Nakashima et al. 2012). 

In Affymetrix rice microarray chips, the arrays contain 
probes to query approximately 48,564 japonica and 1,260 
indica transcripts. Compared with the release 7 of Michigan 
State University (MSU) rice genome annotation database 
(http://rice.plantbiology.msu.edu/ [last accessed October 28, 
2013]; Ouyang et al. 2007; Kawahara et al. 2013), only 
<36,000 out of 56,081 annotated genes were probed in 
chips. No expression data are available for the remaining an- 
notated genes using the chips. In this study, we have designed 
custom microarray chips with probes covering majority of an- 
notated genes from the MSU database. The chips were then 
employed for comparative transcript profiling analysis be- 
tween salinity-sensitive and tolerant rice lines under normal 
and high salinity stress conditions. We have also identified 
single nucleotide polymorphisms (SNPs) between these two 
genotypes at genome level and investigated their biological 
effects. We compared their genomic variation and expression 
divergence and analyzed overrepresented genes and their 
functions by using gene set enrichment analysis (GSEA; 
Mootha et al. 2003) program. We then investigated up-/ 
downregulated genes by high salinity stress, specially focusing 
on these genes encoding various transcription factors and 
stress-related proteins. We also surveyed molecular basis 
and mechanisms underlying expression divergence and evolu- 
tion. Our data showed that genes with large-effect SNPs be- 
tween these two genotypes were limited, and they mainly 
function in protein and macromolecular modification by 
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GSEA. We have detected around 10% of probes/genes with 
differential expression under high salinity stress. Both sensitive 
and tolerant genotypes showed significant gene expression 
divergence under normal and NaCI stress conditions. 
Genomic variations in promoter regions resulted from SNPs, 
indels (1-10 bp of insertion/deletion), and structural variations 
(SVs, including large scale deletions, insertions, duplications, 
inversions, and translocations) significantly contributed to the 
expression divergence and regulation. Some gene families 
were highly regulated by the abiotic stress and might play 
key roles in stress-responsive biological processes. Our data 
also showed that tandem and segmental duplication, 
CACTA and hAT elements significantly contributed to expres- 
sion divergence and regulation between these two contrast- 
ing genotypes under normal conditions or high salinity stress. 

Materials and Methods 

Plant Materials, Growth Conditions, and NaCI Stress 
Treatments 

Both Nipponbare (japonica) and Pokkali (indica) rice plants 
(Oryza sativa L.) were used for all experiments in this study. 
After seed germination, they were planted in greenhouse and 
were grown under natural light and temperature conditions. 
Two-week old seedlings were collected from pots and washed 
with water to remove all soil. The seedling plants were then 
treated with salinity by submerging their roots into 250 mM of 
sodium chloride (NaCI) solution. Samples were collected in 
0- and 8-h intervals. 

Custom Design of Agilent Microarray Chips, 
Hybridization, and Data Analysis 

The publicly available eArray software (Agilent Technologies, 
Santa Clara, CA) was used to design custom 60-mer oligonu- 
cleotide probes. Positive and negative control probes were 
designed from six housekeeping genes encoding elongation 
factor 1 -alpha, sucrose synthase, ubiquitin, CYP86, actin, 
and GAPDH (glyceraldehyde-3-phophate dehydrogenase). In 
rice, a total of 66,433 gene models were annotated in the 
MSU rice genome annotation database (http://rice.plantbiol- 
ogy.msu.edu/; Ouyang et al. 2007; Kawahara et al. 2013), 
from which 40,800 non-transposable element (non-TE) and 
14,278 TE probes (60 bp each) were designed, and the re- 
maining genes were not suitable for probe design due to 
the sequence homology of other genes. Therefore, we have 
designed a total of 55,078 probes for microarray analysis. 
These probe sequences were then submitted for manufactur- 
ing on 8 x 60 k format of chips. Hybridization was carried out 
using the standard Agilent protocol. Agilent GeneSpring GX 
11.5 software was used for data analysis. All expression data 
in this work have been deposited into the NCBI GEO database 
(www.ncbi.nlm.nih.gov/geo, last accessed October 28, 2013) 
under the GEO accession number GSE48395. 



Microarray Data Verification by Quantitative Real-Time 
Reverse Transcription Polymerase Chain Reaction 

A total of 12 genes were randomly selected to verify our mi- 
croarray expression data by quantitative real-time reverse tran- 
scription polymerase chain reaction (qRT-PCR). The Applied 
Biosystems Primer Express® software was used to design all 
primer sequences. Non-gene-specific primer sequences fil- 
tered by Blast searches were then excluded instead of newly 
designed ones. All the primer sequences used in this study are 
listed in supplementary table S1, Supplementary Material 
online. QIAGEN RNeasy Mini Kit was used for total RNA iso- 
lation. The first strand cDNA was synthesized using Invitrogen 
kit. The qRT-PCR analyses were carried out using the AB 
power SYBR Green PCR Master mix kit (Applied Biosystems, 
P/N 4367659) according to the manufacturer's protocol. The 
amplification of an eEF-1a gene was used as an internal con- 
trol to normalize the data and the corresponding sequences of 
these primers are listed in supplementary table S1, 
Supplementary Material online. The threshold cycle (C T ) 
value was automatically calculated based on the changes in 
fluorescence of SYBR Green I dye in every cycle monitored by 
the ABI 7900 system software. The mRNA relative amount 
was estimated from the C T value according to our previous 
description (Jiang et al. 2007) and was used to evaluate ex- 
pression level of analyzed genes. 

Identification of Genes with Differential Expression 
between Nipponbare and Pokkali under Normal 
Conditions or High Salinity Stress 

Differentially expressed genes between Nipponbare and 
Pokkali were identified by their expression abundance. 
Processed microarray data by computing geometric mean 
were used for statistical analyses. Genes with at least two 
times difference in their mRNA levels between these two ge- 
notypes were submitted for Student's Mest. These genes with 
a statistical difference at P< 0.05 were regarded as differen- 
tially expressed genes between these two genotypes. A similar 
standard was also applied for the identification of up- or 
downregulated genes under high salinity stress conditions. 

Determination of Overrepresented Genes and Their Gene 
Ontology Categories 

Gene ontology (GO) assignments for rice genes were obtained 
from the release 7 of MSU rice genome annotation data- 
base (http://rice.plantbiology.msu.edu/; Ouyang et al. 2007; 
Kawahara et al. 2013). We analyzed all three top GO catego- 
ries including biological processes (P), molecular functions (F), 
and cellular components (C) (Harris et al. 2004). GSEA was 
used to determine over- or underrepresented genes and their 
GO categories. The Z test was used to test whether tandem/ 
segmental duplication or transposition by mobile elements 
significantly contributes to expression divergence between 
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Nipponbare and Pokkali or expression regulation under high 
salinity stress. 

Promoter Isolation and Sequencing 

To evaluate the contribution of indels and SVs in promoter 
regions to expression divergence and regulation, 24 genes 
with expression divergence/regulation between Nipponbare 
and Pokkali under either normal or high salinity stress condi- 
tions were selected. Their 1 -kb promoter fragments were am- 
plified from genomic DNA samples by PCRs. Primer sequences 
were listed in supplementary table S1, Supplementary 
Material online. PCRs were carried out using Qiagen Taq 
DNA Polymerase with cat. no. 201205. PCR amplifications 
were performed in 20jil reaction mixtures with 50 ng of 
DNA, 200 jiM of each of dNTPs, 0.5 nM each of primers, 
1.5mM MgCI 2 , 1 unit of Taq DNA polymerase, and buffer 
provided by the supplier. The reactions were performed in the 
PTC200 (MJ Research) thermocycler. The program used for 
PCR was 94 °C for 2 min followed by 30 cycles at 94 °C for 
30 s, 55-60 °C for 60 s, and 72 °C for 1 20 s. The reaction was 
terminated with a 5-min extension step at 72 °C. PCR prod- 
ucts of expected sizes were purified from agarose gel using 
Qiagen Gel Extraction Kit. These DNA fragments were directly 
used for sequencing by DNA Sequencing Kit (PE Applied 
Biosystems). Sequencing reactions were carried out in 
PTC200 (MJ Research) thermocycler using the ABI cycle se- 
quencing protocol: 30 cycles at 96 °C for 30 s, 50 °C for 15 s, 
and 60 °C for 4 min. 

Detection of Promoter Motifs and Their 
Overrepresentation Analysis 

Nipponbare promoter sequences 1 -kb upstream of start 
codon of each gene were retrieved from the MSU rice 
genome annotation database (http://rice.plantbiology.msu. 
edu/ [last accessed October 28, 2013]; Ouyang et al. 2007; 
Kawahara et al. 2013). Promoter motifs were detected by 
submitting these promoter sequences to the PLACE database 
(Higo et al. 1999; http://www.dna.affrc.go.jp/PLACE/index. 
html, last accessed October 28, 2013). A small program was 
designed to investigate the frequency of a promoter motif. 
Overrepresented motifs were detected by the BioProspector 
software (Liu et al. 2001). Promoter logos were generated by 
the enoLOGOS program (Workman et al. 2005). 

Genome Sequence and Annotation Databases 

Nipponbare genomic, cDNA, and protein sequences were 
downloaded from the release 7 of MSU rice genome annota- 
tion database (http://rice.plantbiology.msu.edu/, last accessed 
October 28, 201 3; Ouyang et al. 2007; Kawahara et al. 201 3). 
Pokkali SNP data were downloaded from the OryzaSNP proj- 
ect (McNally et al. 2009; http:/AAAAAA/.OryzaSNP.org, last 
accessed October 28, 201 3). A small perl program was written 
to map all SNPs from Pokkali onto the Nipponbare genome so 



that we can compare the effect of SNPs on genes and their 
promoters at genome level. 

Identification of Transcription Factors and NaCI 
Stress-Related Seven Gene Families at Genome Level 

A total of 84 families of transcription factors (Perez-Rodriguez 
et al. 201 0) were selected for genome-wide identification. The 
GRASSIUS database was also used for the identification of rice 
transcription factors (Yilmaz et al. 2009). In addition to these, 
for genome-wide identification of these transcription factors 
or seven gene family members, their protein motif or domain 
sequences from seed members were obtained from the Pfam 
database (http://pfam.sanger.ac.uk/, last accessed October 28, 
2013). Their seed amino acid sequences were used to con- 
struct hidden Markov model (HMM) profiles using HMMER 
2.3.2 (http://hmmer.janelia.org/, last accessed October 28, 
2013) with default values. Using the profile HMMs, we 
scanned the rice annotated protein database and searched 
for all putative transcription factors and seven gene family 
members. These members were further confirmed by motif 
and domain analysis. All identified genes encoding transcrip- 
tion factors and seven gene families are listed in supplemen- 
tary tables S2 and S3, Supplementary Material online, 
respectively. 

Genome-Wide Identification of Tandemly/Segmentally 
Duplicated or Mobile Element Related Genes 

DNA, cDNA, and protein sequences downloaded from the 
release 7.0 of the MSU rice genome annotation database 
were used for genome-wide identification of tandem/ 
segmental duplication and mobile element-related genes. 
Tandemly and segmentally duplicated rice genes were identi- 
fied according to the description (Jiang, Gonzalez, et al. 201 3). 
Full-length long terminal repeat (LTR) retrotransposon ele- 
ments were identified by executing the LTR_Finder program 
(Xu and Wang 2007). CACTA DNA transposon elements were 
identified by both BlastN and HMM searches. We first carried 
out BlastN searches using both terminal inverted repeats (TIRs) 
and subterminal repeats (TRs) of the elements from multiple 
species (Pereira et al. 1986; Wang et al. 2003; Wicker et al. 
2003; and references therein). We also built a HMM profile 
using HMMER 2.3.2 (http://hmmer.janelia.org/, last accessed 
October 28, 201 3) with default values using seed TIRs and TRs 
from multiple species to scan the rice genome. These searches 
generated two sets of sequences including 5' and 3' terminal 
regions, respectively. Similar strategies were used to identify 
rice M7"elements with query sequences from multiple species 
(Hehl et al. 1991; Huttley et al. 1995; Kempken and 
Windhofer 2001; Moon et al. 2006). Full-length CACTA and 
M7~elements were identified by matching these two terminal 
regions with <30,000 and 4,000 bp in length, respectively. 
Identification of retrogenes, Pack-MULE (Mutator-like trans- 
posable element), hAT (hobo-Ac-Tam3), and Helitron 
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elements was carried out according to the description (Juretic 
et al. 2005; Wang et al. 2006; Sweredoski et al. 2008). These 
genes fully or partially located within a mobile element were 
regarded as mobile element-related genes. 

Results 

Genomic Variations between Nipponbare and Pokkali 
Revealed by SNPs 

Nipponbare is a japonica rice and is sensitive to high salinity 
stress. Pokkali is an indica rice line and is tolerant to abiotic 
stress. The Nipponbare genome has been sequenced (Goff 
et al. 2002), and the SNP data are available for the Pokkali 
genome (McNally et al. 2009). On the basis of these data, we 
have identified a total of 131,080 SNPs located on coding 
regions. Up to 221 genes contain large-effect SNPs that lead 
to premature stop codons or changed start codons with 
shorter or longer length of open reading frames. GSEA 
showed that overrepresented genes with large-effect SNPs 
function mainly in carbohydrate/nucleotide/protein binding. 
Their proteins were mainly located in the plasma membrane 
by prediction and were involved in biological processes includ- 
ing protein/macromolecule modification by GSEA (fig. 1A). 
Expression analysis showed that only around 65% of them 
were detected with expression signal, whereas on average, up 
to 83% of annotated rice genes were expressed in this study 
(see later). Furthermore, their expression levels are relatively 
low when compared with other genes without large-effect 
SNPs. In addition, only a few genes among these large- 
effect SNP-containing genes were regulated by high salinity 
stress in their expression. 

Besides large-effect SNPs, other SNPs located on coding 
regions might contribute to either nonsynonymous or synon- 
ymous substitutions. As synonymous substitutions may not 
lead to any change in protein sequences, we only analyzed 
genes with nonsynonymous substitutions. Totally, we have 
detected 7,561 genes, in which nonsynonymous substitutions 
occurred. GO analysis showed that overrepresented proteins 
were mainly localized on plasma membrane. Their overrepre- 
sented molecular functions include kinase, pyrophosphatase, 
and transferase activities as well as catalytic, molecular trans- 
ducer, motor, and receptor activities (fig. ^B). They mainly 
participate in biological regulation, protein modification, sig- 
naling, and stress response (fig. IB). Expression analysis 
showed that similar percentages of genes were detectable 
between two groups of genes with either nonsynonymous 
or synonymous substitutions. 

General Outlines of Expression Patterns in Nipponbare 
and Pokkali 

To figure out the difference in expression patterns between 
Nipponbare and Pokkali under normal and high salinity stress 
conditions, custom microarray chips were manufactured and 



used for comparative expression profiling. The expression 
analysis was carried out using 14-day-old whole seedlings 
from both Nipponbare and Pokkali rice lines. Two biological 
replicates were performed for both control and 250 mM 
NaCI treatments, resulting in a data set of eight microarrays 
(fig. 2A). Total RNA quality was analyzed by nanodrop reading 
and Agilent Bioanalyzer running (fig. 2B). The data quality was 
assessed by measuring the correlation coefficients (fig. 2Q. 
The general coefficients between two biological replicates 
ranged from 98.3% (under normal growth conditions) to 
98.8% (under NaCI treatments). We have randomly selected 
459 probes, and each probe was printed into 1 5 or 16 differ- 
ent positions in the same chip for technical replicates. The 
resulted coefficients among technical replicates were mea- 
sured from 98.9% to 99.6%. These analyses suggested the 
high quality of data obtained in this study. On the other hand, 
we have also validated our data by qRT-PCR (fig. 2D). 

In the release 7 of MSU rice genome annotation 
database (http://rice.plantbiology.msu.edu/ [last accessed 
October 28, 2013]; Ouyang et al. 2007; Kawahara et al. 
2013), a total of 39,102 protein-coding transcripts have 
been annotated (non-TE). Out of the collection, 38,517 
probes were generated and for the remaining 585 transcripts, 
no probe was generated due to duplicated sequences. 
Among 38,517 genes/loci, we have randomly selected 
2,283 genes/loci with alternative splicing for analyzing expres- 
sion divergence between different splicing models. Thus, a 
total of 40,800 probes were designed for non-TE genes/loci 
(fig. 2E). On the other hand, we have also designed 14,278 
probes from these genes/loci encoding for TEs. Totally, we 
have designed 55,078 probes representing 38,517 non-TE 
and 14,278 TE genes. These probes have been designed for 
array manufacturing on Agilent 8 x 60k format. 

Among a total of 55,078 probes from 52,795 genes/loci 
printed on the microarray chips, 43,964 probes were detect- 
able in Nipponbare (fig. 2£). Among them, 33,419 probes 
were from non-TE genes, and the remaining 10,545 were 
from TE genes. In Pokkali, 43,578 probes were detected 
with expression signaling including 33,097 non-TE and 
10,481 TE genes (fig. 2E). Because some genes contain alter- 
native splicing, a total of 41 ,681 (78.9%) and 41 ,295 (78.2%) 
genes were expressed in Nipponbare and Pokkali, respectively 
(fig. 2E). No statistical difference was detected in their per- 
centages of expressed genes between these two genotypes. 
Higher percentages of expressed genes were detected in non- 
TE genes when compared with TE genes in both genotypes. 

Differentially Expressed Genes between Nipponbare and 
Pokkali 

Based on the genome-wide SNP analysis between Nipponbare 
and Pokkali (fig. 1), limited divergence was observed in these 
two genomes. However, they exhibit obvious differences in 
their phenotypes. To understand their inside molecular bases, 



2036 Genome Biol. Evol. 5(1 1):2032-2050. doi:10.1093/gbe/evt1 52 Advance Access publication October 11, 2013 



Genome-Wide Survey in Rice Genotypes 



GBE 



Genes with large-effect SNPs ■ All annotated rice genes 



20 
16 
12 




GO:0006464 GO:0043412 GO:0030246 GO:0005515 GO:0000166 GO:0005886 
P F C 



B 



40 



30 













xI-On xI-on — H • JJ oo -oo ■ 1 

• t^^o fs ? fs i r^vo oor-~ oor-~ oo°° XA ■ ^^^^ 


r_ 10.97 
t 12.5: 

k n.19 

fc 12.36 


0.62 
0.34 

5. 

" 6 
6 

5. 

~ 5 
5. 

Z 5 
5. 

1.19 

1 0.98 
[ 1.19 
1 0.98 
I 1.49 
I 1.39 
[ 1.49 
1.39 

1.04 
1 0.82 
r 0.85 
' 0.65 

0.46 

0.32 
; 1.28 
I 1.14 

0.46 

0.32 

0.46 

0.32 


° ° : 1 : 1 



g> 20 



10 



(N IT) IT) ON 

in h \o oo 
o oo 

m o\ h o 

(N O O >n 
OOOO 



'sf- o o 
On Tt O 

l> o o o 
o co co in 

in M (N 

oooo 



Tt l> 00 

m \o ^ o\ 

o • - 
o o 



if) o> O t h 

vo in in vo o 

(N (N Q> ' — 1 ' — i CO 

vo 00 vo t-- O 

O O O O 

o o o o o o 



(N OO 

o 3- 

CO O 

o o 



CO iTj 1^ ^ (N 

h (N h h (N \D 

00 00 00 t> 00 "tf" 

On vo co co \o 

O O 

o o o o o o 



ON h rH \C 

00 00 O 00 
O O 00 00 

o in 

VO O O 

oooo 



ooooooooooooooooooooooooooooooooo 



oooo 



oooo 
oooo 



OOOO 
OOOO 



O O O O O O 
OOOO o o 



oooo 
oooo 



o o o o o o 
o o o o o o 



o o o o o 
o o o o o 



Fig. 1. — The effects of SNPs on gene functions and GSEA. (A) GO term analysis of genes with large-effect SNPs by comparing with all annotated rice 
non-TE genes. Green and pink columns indicate the percentages of this GO term in SNP-affected proteins and total annotated rice proteins in the release 7 of 
MSU rice genome annotation database (http://rice.plantbiology.msu.edu/ [last accessed October 28, 2013]; Ouyang et al. 2007; Kawahara et al. 2013), 
respectively. (B) GO term analysis of genes with nonsynonymous substitutions by SNPs. Green and blue columns indicate the percentages of this GO term in 
SNP-affected proteins and total annotated rice proteins, respectively. Only GO terms with statistically significant differences at P< 0.05 were listed in (A) and 
(B). "F," and "C" in (A) and (B) indicate the GO categories biological process, molecular function, and cellular component, respectively. GO term 
annotation in (A) and (B) refers to the following: GO:0006464 (protein modification process), GO:0043412 (macromolecule modification), GO:0030246 
(carbohydrate binding), GO:0005515 (protein binding), GO:0000166 (nucleotide binding), GO:0005886 (plasma membrane), GO:0023052 (signaling), 
GO:0009875 (pollen-pistil interaction), GO:0007165 (signal transduction), GO:0050789 (regulation of biological process), GO:0050794 (regulation of 
cellular process), GO:0023046 (signaling process), GO:0023060 (signal transmission), GO:0065007 (biological regulation), GO:0051704 (multiorganism 
process), GO:0009856 (pollination), GO:0016265 (death), GO:0008219 (cell death), GO:0006950 (response to stress), GO:0007154 (cell communication), 
GO:0016301 (kinase activity), GO:0016772 (transferase activity, transferring phosphorus-containing groups), GO:0016740 (transferase activity), 
GO:0004872 (receptor activity), GO:0016818 (hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides), GO:0019825 (oxygen 
binding), GO:001 681 7 (hydrolase activity, acting on acid anhydrides), GO:0003774 (motor activity), GO:0003824 (catalytic activity), GO:001 6462 (pyropho- 
sphatase activity), GO:0017111 (nucleoside-triphosphatase activity), GO:0060089 (molecular transducer activity), GO:0016787 (hydrolase activity), and 
GO:0004871 (signal transducer activity). 



we further analyzed their expression divergence between 
these two genotypes under normal and high salinity stress 
conditions. Our data revealed 3,927 genes with at least two 
times lower expression abundance (downregulated) in Pokkali 
when compared with Nipponbare (fig. 3A). Among them, 
808 and 1,750 genes were detected from normal and high 
salinity stress conditions, respectively. The remaining 1,369 
genes showed differential expression in both growth condi- 
tions. On the other hand, only 1,835 genes were detected 
with at least two times higher expression (upregulated) in 
Pokkali when compared with Nipponbare (fig. 3B). 
Interestingly, under high salinity stress, much more differen- 
tially expressed genes were detected between these two 



genotypes than under normal growth conditions (fig. 3A 
and B). 

To evaluate whether these differentially expressed genes 
between these two genotypes were biased toward certain 
specific functions, we carried out the investigation on GO. 
We identified over- or underrepresented GO terms by 
GSEA. Among genes with lower expression abundance in 
Pokkali (fig. 3A), a total of four GO terms were underrepre- 
sented in Pokkali when compared with that in Nipponbare 
(fig. 30- These GO terms included GO:0006950 (response 
to stress), GO:0000166 (nucleotide binding), GO:0016301 
(kinase activity), and GO:0016772 (transferase activity, trans- 
ferring phosphorus-containing groups). On the other hand, 
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Fig. 2. — Sampling and quality assessment of the microarray expression data and general expression profiling. (A) Sample collection of 14-day-old whole 
seedlings. (B) An example of total RNA quality analysis by nanodrop reading and Agilent Bioanalyzer running. (O Coefficiency analysis. The general 
coefficients between two biological replicates ranged from 98.3% (under normal growth conditions) to 98.8% (under NaCl treatments). (D) An example 
of microarray expression data confirmed by qRT-PCR. A total of 1 2 genes were selected for qRT-PCR analysis. (£) Numbers of probes used in microarray chips 
and general expression profiling of genes encoding TEs and non-TEs. 



among genes with higher expression in Pokkali (fig. 3B), a 
total of four overrepresented GO terms were detected, 
which included GO:0019825 (oxygen binding), 
GO:0016740 (transferase activity), GO:0016301, and 
GO:0016772 (fig. 3D). Both GO:0016301 and GO:0016772 
were underrepresented in genes with lower expression in 
Pokkali (fig. 30 but were overrepresented in genes with 
higher expression in Pokkali (fig. 3D). The term GO:0003824 
(catalytic activity) was the only term with underrepresentation 
in genes with higher expression in Pokkali (fig. 3D). 

Differentially Regulated Genes under High Salinity Stress 
in Both Nipponbare and Pokkali 

Nipponbare is a salinity-sensitive rice variety, whereas Pokkali 
is a salinity-tolerant line. We are interested in exploring the 



difference in gene expression under high salinity stress be- 
tween these two genotypes. We have identified a total of 
2,282 upregulated genes under high salinity stress (250 mM 
NaCl). Among them, 947 and 652 genes were detected only 
in Pokkali and Nipponbare, respectively. The remaining 683 
genes were detected in both genotypes (fig. 44). On the other 
hand, compared with upregulated genes, we have identified 
nearly double numbers of downregulated genes under high 
salinity stress (fig. 4B). Among the 3,253 downregulated 
genes, 789 and 1,439 genes were detected only in Pokkali 
and Nipponbare, respectively, indicating a higher percentage 
of downregulated genes in Nipponbare (fig. 4B). The remain- 
ing 1,025 genes were detected in both genotypes. 

We were also interested in the biological functions of 
these up- or downregulated genes. We analyzed these 
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Fig. 3. — Differentially expressed genes between Nipponbare and Pokkali. (A) Differentially expressed genes with at least two times lower in Pokkali in 
their expression level when compared with Nipponbare (downregulated genes). (B) Differentially expressed genes with at least two times higher in Pokkali in 
their expression level when compared with Nipponbare (upregulated genes). (0 Over- or underrepresented GO terms by GSEA in downregulated genes as 
shown in (A). (D) Over- or underrepresented GO terms by GSEA in upregulated genes as shown in (B). GO terms highlighted by red and green fonts were 
detected in both down- and upregulated genes. "P" and "F" indicate the GO categories biological process and molecular function, respectively. GO term 
annotation in (A) and (B) refers to the following: GO:0006950, response to stress; GO:0000166, nucleotide binding; GO:0016301, kinase activity; 
GO:0016772, transferase activity (transferring phosphorus-containing groups); GO:0019825, oxygen binding; GO:0016740, transferase activity; 
GO:0003824, catalytic activity. 



genes regulated by high salinity stress only in either 
Nipponbare or Pokkali but not in both genotypes as we 
were interested in figuring out the difference between 
these two genotypes in response to high salinity stress. 
For 947 upregulated genes only in Pokkali, three GO 
terms, GO:0009628 (response to abiotic stimulus), 
GO:0006950 (response to stress), and GO:0050896 (re- 
sponse to stimulus), were underrepresented; the remaining 
five GO terms were overrepresented (fig. 40- They are 
GO:0009991 (response to extracellular stimulus), 
GO:0007154 (cell communication), GO:0019825 (oxygen 
binding), GO:0005777 (peroxisome), and GO:0042579 
(microbody). However, for 652 upregulated genes in 
Nipponbare, only two overrepresented GO terms were de- 
tected, and they were also presented in Pokkali (fig. 40- For 
1,439 downregulated genes in Nipponbare, 23 GO terms 
were detected, and all of them were overrepresented 
(fig. 4D and £). Among them, 16 overrepresented GO 
terms were Nipponbare specific (fig. 4D). They are 
GO:0006091 (generation of precursor metabolites and 
energy), GO:0005975 (carbohydrate metabolic process), 



GO:0007049 (cell cycle), GO:0009628 (response to abi- 
otic stimulus), GO:0006629 (lipid metabolic process), 
GO:0016818 (hydrolase activity, acting on acid 
anhydrides, in phosphorus-containing anhydrides), 
GO:0016817 (hydrolase activity, acting on acid anhydrides), 
GO:0003774 (motor activity), GO:0016462 (pyrophospha- 
tase activity), GO:001 71 1 1 (nucleoside-triphosphatase activ- 
ity), GO:0003824 (catalytic activity), GO:0019825 (oxygen 
binding), GO:0009536 (plastid), GO:0044444 (cytoplasmic 
part), GO:0005737 (cytoplasm), GO:0016020 (membrane), 
and GO:0005623 (cell). The remaining seven GO terms 
were detected in both Pokkali and Nipponbare (fig. 4E). 
They are GO:0006629 (lipid metabolic process), 
GO:0015979 (photosynthesis), GO:0019748 (secondary 
metabolic process), GO:0009579 (thylakoid), GO:0005576 
(extracellular region), GO:0030312 (external encapsulating 
structure), and GO:0005618 (cell wall). Interestingly, for 
each GO term, higher percentages of downregulated 
genes were detected in Nipponbare when compared with 
Pokkali (fig. 4E). Generally, under high salinity stress, genes 
with functions in response to stress or stimulus were 
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Fig. 4. — Differentially regulated genes under high salinity stress in both Nipponbare and Pokkali. (A) and (B) Up- and downregulated genes under 
250 mM NaCl, respectively. (0 Over- or underrepresented GO terms by GSEA in upregulated genes as shown in (A). GO terms highlighted by green and red 
fonts were these terms detected in both Pokkali and Nipponbare. (D) Overrepresented GO terms by GSEA in downregulated genes only in Nipponbare as 
shown in (B). (£) Overrepresented GO terms by GSEA in downregulated genes in both Nipponbare and Pokkali as shown in (B). GO term annotation in (Q-(/F) 
refers to the following: GO:0009628, response to abiotic stimulus; GO:0006950, response to stress; GO:0009991, response to extracellular stimulus; 
GO:0007154, cell communication; GO:0050896, response to stimulus; GO:0019825, oxygen binding; GO:0005777, peroxisome; GO:0042579, microbody; 
GO:0006091, generation of precursor metabolites and energy; GO:0005975, carbohydrate metabolic process; GO:0007049, cell cycle; GO:0016818, 
hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides; GO:0016817, hydrolase activity, acting on acid anhydrides; 
GO:0003774, motor activity; GO:0016462, pyrophosphatase activity; GO:00171 11, nucleoside-triphosphatase activity; GO:0003824, catalytic activity; 
GO:0009536, plastid; GO:0044444, cytoplasmic part; GO:0005737, cytoplasm; GO:0016020, membrane; GO:0005623, cell; GO:0006629, lipid metabolic 
process; GO:0015979, photosynthesis; GO:0019748, secondary metabolic process; GO:0009579, thylakoid; GO:0005576, extracellular region; 
GO:0030312, external encapsulating structure; GO:0005618, cell wall. 
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down- or upregulated, and obvious differences were ob- 
served in the gene expression profilings between 
Nipponbare and Pokkali. 

Expression Profile of Genes Encoding Transcription 
Factors 

Among the differentially expressed genes between 
Nipponbare and Pokkali under normal and high salinity 
stress conditions, we were interested in these genes encoding 
transcription factors. We have identified 2,777 genes encod- 
ing 93 different transcription factor families in the rice genome 
(supplementary table S2, Supplementary Material online). 
Among them, 319 (1 1 .5%) genes showed up- or downregu- 
lated expression under high salinity stress, and these genes 
were from either Nipponbare or Pokkali. Among these regu- 
lated genes, around one third of them showed similar expres- 
sion patterns under NaCI stress between two genotypes and 
the remaining two thirds of them showed expression diver- 
gence between these two genotypes. The data suggested a 
high expression divergence of transcription factor encoding 
genes between Nipponbare and Pokkali under NaCI stress. 

We then further analyzed the families with ten or more 
members in their responses to high salinity stress in detail. 
We identified a total of nine such families including Ap2/ 
EREBP, bHLH (basic helix-loop-helix), bZIP (basic region/leucine 
zipper motif), C2H2 (Cysteine-2/Histidine-2), HB (homeobox), 
MYB (Myeloblast), NAC, Orphan, and WRKY (fig. SA). The 
largest two families are MYB and AP2/EREBP. A total of 33 
MYB and 27 AP2/EREBP family members were up- or 
downregulated by NaCI in either Nipponbare or Pokkali. For 
the 33 MYB family members, four genes were downregulated 
by NaCI only in Nipponbare and 18 genes were up- or 
downregulated by NaCI only in Pokkali (fig. SB). The remain- 
ing 1 1 genes were regulated by NaCI in both Nipponbare and 
Pokkali (fig. SB). For the 27 AP2/EREBP family members, nine 
genes were up- or downregulated by NaCI only in Nipponbare 
and other nine genes were up- or downregulated by NaCI 
only in Pokkali (fig. 50- The remaining nine genes were reg- 
ulated by NaCI in both Nipponbare and Pokkali (fig. 5Q. 

Expression Profile of Genes Encoding Na+/Other Cations 
and Abiotic Stress-Related Proteins 

Because both Nipponbare and Pokkali showed obvious differ- 
ence in their response to high salinity stress, we were interested 
in expression profiles of genes encoding stress-related proteins 
in these two genotypes. We have analyzed seven different 
families including HKT, NHX, calmodulin, CIPK, VHP, dehydrin, 
and SRP. We have identified a total of 129 genes encoding 
these seven families (supplementary table S3, Supplementary 
Material online) and 42 (32.6%) of them were regulated by 
NaCI in either Nipponbare or Pokkali. The rate is obviously 
higher than the average percentages of NaCI-regulated 
genes in Nipponbare and Pokkali. However, different families 



exhibited difference in their percentages of regulated genes, 
from 14% for VHP to 86% for dehydrin (fig. 6A). Generally, 
the percentages were significantly higher than that in 
other genes encoding transcription factors or other proteins. 
Figure 6B showed the detailed expression patterns of 42 up- or 
downregulated genes. For the HKT family, all five differentially 
expressed genes were downregulated by NaCI in both 
Nipponbare and Pokkali, and no significant difference was ob- 
served between these two genotypes. A similar situation was 
also observed for the VHP family, where only one upregulated 
gene was detected under NaCI treatment. For the remaining 
five gene families, expression divergence was observed be- 
tween Nipponbare and Pokkali under high salinity stress. In 
these gene families, some genes were up- or downregulated 
by NaCI treatment only in either Nipponbare or Pokkali. 

On the basis of our microarray analysis, we have selected a 
total of 30 candidate genes as listed in supplementary table 
S4, Supplementary Material online. These genes encode var- 
ious transcription factors, Na+/other cations, and abiotic 
stress-related proteins. Their expression was differentially reg- 
ulated by high salinity stress in either one or two tested vari- 
eties. They would be promising candidate genes for improving 
rice tolerance to high salinity stress by genetic manipulation. 

Expression Divergence of Genes with Alternative Spliced 
Models in Nipponbare and Pokkali 

In the MSU database, a total of 6,457 genes/loci were anno- 
tated with alternative splicing models (http://rice.plantbiology. 
msu.edu/ [last accessed October 28, 2013]; Ouyang et al. 
2007; Kawahara et al. 2013). To explore the expression diver- 
gence of genes with alternatively spliced models in 
Nipponbare and Pokkali under either normal or high salinity 
stress conditions, we randomly selected 2,282 out of 6,457 
genes/loci annotated with alternatively spliced models. Two 
isoforms were analyzed for each gene. Among these genes/ 
loci, 2,212 genes showed expression signals. A total of 7 1 1 loci 
(32%) were detected with differential expression between 
two alternatively spliced models (fig. 7A). Most of the expres- 
sion divergence (521 genes, 73.3%) was observed in both 
Nipponbare and Pokkali under normal and high salinity 
stress conditions (fig. IB and O- Some of the expression diver- 
gence (151 genes, 21.2%) occurred only under high salinity 
stress (fig. ID). The other 39 divergent genes (5.5%) were 
genotype specific, that is, expression divergence was observed 
only in either Nipponbare or Pokkali (fig. IB). 

To further investigate whether these genes with expression 
divergence between two alternatively spliced models are 
biased to certain functions, we submitted these genes for 
GSEA. In biological process (P) category, no over- or underrep- 
resented GO term was detected under default parameters. 
In the molecular function (F) category, one GO term 
GO:0016787 (hydrolase activity) was detected with overrep- 
resentation (fig. IF). This may imply that hydrolase activity may 
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Fig. 5. — Expression profiling of rice genes encoding transcription factors under normal and high salinity stress conditions. (A) Differentially expressed 
genes in top ten families of transcription factors. (B) and (Q Heat maps showed up- or downregulated genes encoding MYB and AP2/EREBP transcription 
factors, respectively. Normalized expression values were calculated by Agilent GeneSpring GX 1 1 .5 software and were used for heat map analyses in figures 
5 and 6. Red, black, and green colors indicated that normalized expression values were >0, =0, and <0, respectively, in the matrix. Gray color indicated that 
no statistical difference in their expression level has been detected in these genes under high salinity stress. 



be evolutionarily regulated by alternative splicing between 
two genotypes under either normal or NaCl stress conditions. 
A majority of overrepresented GO terms were observed in the 
cellular component (C) category. A total of five GO terms 
were detected with statistical overrepresentation (fig. If). 
They are GO:0044444 (cytoplasmic part), GO:0005737 (cyto- 
plasm), GO:0044424 (intracellular part), GO:0005739 (mito- 
chondrion), and GO:0005622 (intracellular). 



Promoter Variation and Expression Divergence between 
Nipponbare and Pokkali 

We have detected more than 5,000 differentially expressed 
genes under normal or NaCl stress conditions between 
Nipponbare and Pokkali, and they were classified into six 
groups as shown in figure 3A and B. To explore the molecular 
basis of their expression divergence, we surveyed the SNP 
variations in their 1-kb promoter regions upstream of start 
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Fig. 6. — Expression profiling of rice genes encoding Na+/other cation and abiotic stress-related proteins. (A) Differential expression profiling in seven 
families of genes encoding Na+/other cation and abiotic stress- related proteins. (B) and (0 Heat maps showed up- or downregulated genes. Red, black, and 
green colors indicate the same annotation as shown in figure 5. Gray color indicates that no statistical difference in their expression level has been detected in 
these genes under high salinity stress. 



codon of genes. On average, 16.8% of rice genes showed 
SNP variations in their promoter regions (fig. SA). However, 
among the six groups of differentially expressed genes, genes 
from five groups showed significant lower or similar SNP var- 
iations when compared with the average SNP variations (fig. 
SA). Only one group showed significantly higher SNP variation 
(18.8%), and genes from this group exhibited at least two 
times higher expression only in Pokkali (upregulated) under 
NaCl stress (fig. SA). The data suggested that SNP variations 
contributed to expression divergence mainly under certain 
stress conditions. To investigate how SNPs contribute to ex- 
pression divergence, we further analyzed the difference in 
their overrepresented promoter motifs between these two 
genotypes. Promoter sequences from the group with 18.8% 
of SNP variations between these two genotypes were 



submitted to motif analysis (see Materials and Methods). We 
have detected three overrepresented promoter motifs and 
two of them were the same, and only one showed a differ- 
ence between these two genotypes. In Nipponbare, most of 
the motif sequence is CACACACGCA and in Pokkali, the motif 
is CACACACACA (fig. SB). In Nipponbare, the sequence en- 
codes a promoter motif named DPBFC0REDCDC3, which is 
an abscisic acid response motif (Finkelstein and Lynch 2000). 
However, in Pokkali, no motif was detected due to the SNP 
variation from G to A (fig. SB). The results suggest a contri- 
bution of SNPs to expression divergence. 

Because our data showed that SNP variations contribute 
to the expression divergence mainly for genes with two 
times higher Pokkali under NaCl (fig. SA), we also analyzed 
the contribution of indels and SVs to expression divergence. 
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Fig. 7. — Functional annotation and expression analysis of genes with alternative spliced models. (A) General outline of genes with alternative splicing 
models and their expression divergence. (B) — (£) Examples of genes with different types of expression divergence. (F) Overrepresented GO terms by GSEA in 
these genes with alternative spliced models. GO term annotation refers to the following: GO:0016787, hydrolase activity; GO:0044444, cytoplasmic part; 
GO:0005737, cytoplasm; GO:0044424, intracellular part; GO:0005739, mitochondrion; GO:0005622, intracellular. 



As no whole-genome sequencing data are available for the 
Pokkali genome, we have randomly selected 24 genes with 
differential expression between Nipponbare and Pokkali 
under either normal or NaCl conditions for promoter se- 
quencing analysis. Our data showed that 23 out of 24 
genes showed variations in their promoter sequences. As 
a result, their promoters showed a difference in at least 
one motif. One of such genes is LOC_Os08g44850. This 
gene showed a very low expression level in Pokkali (fig. 
80- However, in Nipponbare, this gene showed high ex- 
pression level and exhibited downregulated expression 
under NaCl stress (fig. 80- Sequencing data revealed mul- 
tiple indels and SVs in their promoter sequences besides 
SNPs (fig. 8D). As a result, 1 1 motifs lost and two motifs 
gained in Pokkali (fig. 8E). One of the lost motifs is 
TATABOX5, which might contribute to very low expression 
of this gene in Pokkali. 



Effect of Duplication and Transposition on Gene 
Expression under High Salinity Stress 

To explore the evolutionary basis of expression divergence 
within and between Nipponbare and Pokkali under normal 
and high salinity stress, we further analyzed the effect of 
both duplication and mobile elements on expression diver- 
gence. We have identified 5,111 tandemly and 6,244 
segmentally duplicated genes, 8,502 LTR-retrotransposon- 
related, 2,912 retrogenes, 679 CACTA element-related, 
3,707 Pack-MULE related, 60 Mr-related, and 81 Helitron- 
related genes in the rice genome (supplementary table S5, 
Supplementary Material online). We then analyzed their con- 
tribution to expression divergence under NaCl stress in both 
Nipponbare and Pokkali. In Pokkali, on average, we have de- 
tected 7.8% of genes that were regulated by NaCl stress. 
Among the analyzed two duplication models and six types 
of mobile elements, only segmental duplication significantly 
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Fig. 8. — Genomic variations of promoter sequences and their motif analysis between Nipponbare and Pokkali. (A) Comparative analysis of promoter 
SNP variations among different types of regulated genes under high salinity stress. (£) Overrepresented promoter motifs between Nipponbare and Pokkali as 
revealed by SNPs. The BioProspector program (Liu et al. 2001) was used to detect overrepresented promoter motifs. (O An example of a gene 
(LOC_Os08g44850) with expression divergence between Nipponbare and Pokkali. (D) Promoter sequencing and motif analysis of the gene 
LOC_Os08g44850. Motifs highlighted by blue fonts were Nipponbare (NB) specific, whereas these motifs with red fonts were Pokkali (PK) specific. (E) 
Species-specific motifs in Nipponbare and Pokkali and their annotation. 
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Fig. 9. — The effect of duplication/transposition on gene regulation and divergence. (A) The effect of duplication/transposition on gene regulation under 
high salinity stress. The average columns indicate the percentages of genes with up- or downregulated genes among total analyzed genes in either Pokkali or 
Nipponbare. (B) The effect of duplication/transposition on gene divergence between Nipponbare and Pokkali. The average columns indicate the percentages 
of genes with expression divergence between Nipponbare and Pokkali under either normal (CK) or high salinity stress conditions. The stars in (A) and (B) 
indicate statistical difference at P< 0.05. 



contributed to expression divergence and up to 11.2% of 
segmentally duplicated genes were up- or downregulated 
by high salinity stress (fig. 9A). However, in Nipponbare, 
higher expression divergence was observed in both tandemly 
and segmentally duplicated genes as well as Mr-related 
genes (fig. 9A). 

We then surveyed the effects of duplication and transpo- 
sition on gene expression divergence between Pokkali and 
Nipponbare. Under normal growth conditions, we have de- 
tected 6.9% of genes showing expression divergence 
between Pokkali and Nipponbare on average (fig. 9B). For 
CACTA and Mr-related genes, up to 9.5% and 16.7% of 
them showed expression divergence between these two ge- 
notypes. For duplicated genes and the remaining four types of 
mobile element-related genes, similar or lower percentages of 
them showed expression divergence under normal growth 
conditions (fig. 9B). However, a different situation was ob- 
served under NaCI stress conditions. On average, 10.3% of 



analyzed genes exhibited expression divergence under this 
stress (fig. 9B). We found that 11.5%, 12.6%, and 16.7% 
of genes expanded from tandem duplication, and CACTA 
and Mr mobile elements, respectively, were differentially ex- 
pressed between Pokkali and Nipponbare under NaCI stress, 
which were significantly higher than the average (fig. 9B). 
Thus, different gene duplication/expansion mechanisms dif- 
ferentially contributed to the expression divergence between 
Nipponbare and Pokkali or under high salinity stress. 

Discussion 

Custom Microarray Analysis Revealed More Genes with 
Detectable Expression Signal in the Rice Genomes under 
Normal or Stress Conditions 

How many genes are expressed in rice during seedling stage 
and what is the difference in gene expression profiling under 
certain stress? Large amounts of studies have been carried out 
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to figure out these questions. However, no definite answer 
has been achieved. Because of the limited availability of ex- 
pressed sequence tags (EST) and cDNA sequences at earlier 
stages, limited probes were printed for microarray analysis; 
thus, limited genes were analyzed (Kawasaki et al. 2001). 
Currently, widely used rice array chips were from Affymetrix 
(http://www.affymetrix.com, last accessed October 28, 2013). 
This array contains probes to query 51,279 transcripts repre- 
senting two rice cultivars, with approximately 48,564 japonica 
transcripts and 1,260 transcripts representing the indica culti- 
var. Most of the rice microarray analyses were based on these 
chips. On the other hand, the rice genome was completely 
sequenced and annotated in its encoding genes (http://rice. 
plantbiology.msu.edu; Kawahara et al. 2013). Based on the 
various Affymetrix microarray-based expression analyses, only 
around 23,821 annotated genes were expressed (Jiang and 
Ramachandran 2010). This might be partially due to the fact 
that no probe has been printed in the Affymetrix chips for 
many annotated rice genes. Thus, currently available commer- 
cial microarray chips are not enough to cover all rice genes and 
to outline their expression profiling. In this study, we have 
designed 55,078 probes representing 38,517 non-TE and 
14,278 TE genes using the Agilent 8 x 60 k format of micro- 
array chips. Our data revealed 41,681 genes in Nipponbare 
and 41,295 genes in Pokkali with detectable expression sig- 
naling under either normal or high salinity stress conditions 
(fig. 2E). In a previous report, around 31,382 genes were de- 
tected with expression signal using microarray, massively par- 
allel signature sequencing or by cDNA/EST (Jiang and 
Ramachandran 2010). By comparing these published data, 
our custom microarray analysis detected 16,230 (7,797 non- 
TE and 8,433 TE) genes in Nipponbare and 16,313 (7,863 
non-E and 8,450 TE) genes in Pokkali, which were not 
probed or detected with expression signal in previous studies 
(Jiang and Ramachandran 2010). On the other hand, we only 
used 250 mM NaCI solution for high salinity stress. Thus, more 
expressed genes including differentially expressed genes 
should be detected if higher concentrations of NaCI (e.g., 
300 mM) or a series of concentration gradients for salinity 
stress can be employed. Thus, more genes should have par- 
ticipated in rice growth and development as well as stress 
regulation during seedling stages. Furthermore, much more 
genes should be detected with expression signaling if more 
developmental stages and more abiotic or biotic stresses were 
investigated for gene expression analyses. Therefore, rice 
growth and development should be involved in a comprehen- 
sive gene regulation system with the participation of more 
than 40,000 genes. 

Expression Divergence Significantly Contribute to Genetic 
Diversity in Rice 

Genetic diversity has been assessed by morphological traits, 
physiological index, and molecular markers (Caldo et al. 



1996; Jahn et al. 2011; Zhang et al. 2011). With the 
rapid progress on sequencing technologies, the whole 
genome sequencing data have been used to evaluate ge- 
netic diversity (Huang et al. 2012; Xu et al. 2012). In addi- 
tion, expression divergence was also used to survey genetic 
diversity in animals and plants (Kliebenstein et al. 2006; 
Albert et al. 2012). However, in rice, little is known about 
genome-wide expression divergence of genes among differ- 
ent germplasms and its contribution to genetic diversity. We 
have identified more than 5,700 genes with differential ex- 
pression in their transcription abundance between 
Nipponbare and Pokkali under normal conditions or high 
salinity stress (fig. 3). We have also identified differentially 
regulated genes by NaCI stress between these two geno- 
types (fig. 4). Compared with genetic variations revealed by 
SNPs and their contribution to the divergence of biological 
functions of corresponding genes, the number of genes 
with expression divergence between these two genotypes 
is obviously higher. These data suggested that expression 
divergence should also be regarded as an index to evaluate 
genetic diversity in rice. Furthermore, studies showed that 
upregulation or downregulation of certain genes in trans- 
genic rice plants significantly altered their tolerance to salin- 
ity stress (Jiang and Ramachandran 2010; Nakashima et al. 
2012), and some of these genes also showed expression 
divergence between Nipponbare and Pokkali under normal 
and salinity stress conditions. The data suggested that ex- 
pression divergence might contribute to cultivar improve- 
ment in salinity tolerance during rice domestication. Our 
expression data might provide a reference for further im- 
proving rice salinity tolerance by selecting certain genes for 
genetic modification. 

Molecular Basis of Gene Expression Divergence between 
Nipponbare and Pokkali 

Generally, substantial differences in gene expression patterns 
between closely related species have been reported in sev- 
eral studies (Rifkin et al. 2003; Khaitovich et al. 2006; Tirosh 
et al. 2009, 2010; Rebeiz et al. 2011). Mechanisms under- 
lying these differences have not been fully understood yet. 
A promoter is required to initiate transcription of a gene 
and expression divergence might occur in case of promoter 
mutation. Under normal growth conditions, promoter SNP 
variations showed no statistical difference between corre- 
sponding genes with expression divergence and no expres- 
sion divergence (fig. 8). Similar results were also observed 
between genotypes Nipponbare and indica 93-1 1 (Zhang 
et al. 2008). Interestingly, our data showed that promoter 
SNP variations between Nipponbare and Pokkali were only 
detected in genes with higher level of expression in Pokkali 
under salinity stress (fig. SA). SNP-mediated mutation in 
promoter motifs might be regarded as a reason for expres- 
sion divergence (fig. SB). Besides SNP, indels and SVs of 
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promoter regions might also contribute to expression diver- 
gence. Out of the 24 differentially expressed genes between 
Nipponbare and Pokkali under normal and NaCI stress con- 
ditions, 23 of them were detected with promoter variations 
in either SNP/indel or SV. These variations significantly con- 
tributed to the change in promoter c/s-regulatory elements, 
thus, resulting in expression divergence (fig. 8C and D). 
Previous study also showed that promoter indels significantly 
contributed to the expression divergence between 
Nipponbare and 93-11 (Zhang et al. 2008). On the other 
hand, our data showed that a few differentially expressed 
genes between Nipponbare and Pokkali also have the same 
promoter sequence. The data suggest the contribution of 
DNA methylation or other mechanisms to gene expression 
divergence between these two genotypes. Both genetic and 
epigenetic factors might affect gene expression divergence 
among different genotypes and their contribution rates 
might vary depending on different organisms. Up to 
31.7% of genes with expression divergence showed no 
promoter variations between analyzed grain and sweet sor- 
ghum lines (Jiang, Ma, et al. 2013). Thus, in sweet and 
grain sorghum lines, DNA methylation might be regarded 
as an important factor to contribute to gene expression di- 
vergence (Jiang, Ma, et al. 2013). However, in Nipponbare 
and Pokkali, promoter variations from SNPs, indels, and SVs 
but not DNA methylation should be regarded as main fac- 
tors to contribute to expression divergence. 

Molecular Mechanism of Expression Divergence 

Sequencing data of indica and japonica rice genomes showed 
that the majority of tandem and segmental duplications as 
well as transposition events of mobile elements occurred 
beyond the divergence between these two subspecies (Goff 
et al. 2002; Yu et al. 2002). These duplication or transposition 
events provided a basis for expression divergence. We were 
interested in whether these events contributed to expression 
divergence under normal or in response to high salinity stress. 
We have analyzed the effect of tandem and segmental dupli- 
cations as well as six different mobile elements (LTR retrotran- 
sposon, non-LTR retrotransposon, CACTA element, MULE, 
hAT, and Helitron) on gene expression divergence. Our data 
showed that tandem and segmental duplication, CACTA and 
hAT elements significantly contributed to expression diver- 
gence under normal conditions or high salinity stress (fig. 9). 
Thus, duplication and mobile elements might partially contrib- 
ute to species divergence through expression regulation. In 
this study, we only analyzed the expression divergence 
under normal and high salinity stress at seedling stage. 
Higher expression divergence should be revealed between 
these two genotypes by investigating their expression profiling 
under additional abiotic and biotic stresses and at multiple 
developmental stages. 
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